(2.2.3.2-2) UCB1 algorithm
The UCB1 algorithm is a popular algorithm in thinking engine of computer go and shogi. This algorithm gives us a hint on how to cope with uncertain circumstances.
In the algorithm, we add terms that positively evaluate the uncertainty and then compare the choices. In other words, we include values like curiosity in the evaluation. The curiosity makes us prefer to challenge the less experienced situations. *9 UCB stands for upper confidence bound. In daily life, we express uncertain values in the range form such as "perhaps 10 to 20". A range which includes an uncertain value with a certain probability is called a confidence interval. For example, in the case a value X is in the range of 10 to 20 with 95% probability, we express it that 95% confidence interval of X is 10 to 20. The upper confidence bound points to the larger boundary of this confidence interval. In this example, it is 20. Let's take a look at the diagram.
https://gyazo.com/1a2ba78ccede7883880f0c36cb74ecd7
Fig: Accorging to the UCB1 algorithm, A is better
In ❶, by comparing Task A and B by their average, B is better. However, since this judgment is based on the average of what we have experienced so far, we may fall into the pessimistic misunderstanding. ❷ represents the confidence interval of the benefits of A and B. The confidence interval of A is wider than B. This is because the uncertainty of A is high. The benefit of A may be much larger than the average up to the present time, or it may be much smaller. By comparing on the upper bound of this confidence interval, A is better to choose.
By optimistic judgment like this, we reduce the possibility of falling into a pessimistic misunderstanding, and we balance the exploration and exploitation. Pushing back this idea back to the management context: you should not judge by what you get on average, but what can you get in the best case.
*9: This algorithm is known to be "regret" (difference of return from the case where the omniscient God chooses actions) is smaller than other methods, so we can say "optimism minimizes the regret." en.icon
---
This page is auto-translated from /nishio/(2.2.3.2-2) UCB1 algorithm. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.